Goto

Collaborating Authors

 tree method


Deep Trees for (Un)structured Data: Tractability, Performance, and Interpretability

arXiv.org Artificial Intelligence

Decision Trees have remained a popular machine learning method for tabular datasets, mainly due to their interpretability. However, they lack the expressiveness needed to handle highly nonlinear or unstructured datasets. Motivated by recent advances in tree-based machine learning (ML) techniques and first-order optimization methods, we introduce Generalized Soft Trees (GSTs), which extend soft decision trees (STs) and are capable of processing images directly. We demonstrate their advantages with respect to tractability, performance, and interpretability. We develop a tractable approach to growing GSTs, given by the DeepTree algorithm, which, in addition to new regularization terms, produces high-quality models with far fewer nodes and greater interpretability than traditional soft trees. We test the performance of our GSTs on benchmark tabular and image datasets, including MIMIC-IV, MNIST, Fashion MNIST, CIFAR-10 and Celeb-A. We show that our approach outperforms other popular tree methods (CART, Random Forests, XGBoost) in almost all of the datasets, with Convolutional Trees having a significant edge in the hardest CIFAR-10 and Fashion MNIST datasets. Finally, we explore the interpretability of our GSTs and find that even the most complex GSTs are considerably more interpretable than deep neural networks. Overall, our approach of Generalized Soft Trees provides a tractable method that is high-performing on (un)structured datasets and preserves interpretability more than traditional deep learning methods.


Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

arXiv.org Machine Learning

Tree-based methods are powerful nonparametric techniques in statistics and machine learning. However, their effectiveness, particularly in finite-sample settings, is not fully understood. Recent applications have revealed their surprising ability to distinguish transformations (which we call symbolic feature selection) that remain obscure under current theoretical understanding. This work provides a finite-sample analysis of tree-based methods from a ranking perspective. We link oracle partitions in tree methods to response rankings at local splits, offering new insights into their finite-sample behavior in regression and feature selection tasks. Building on this local ranking perspective, we extend our analysis in two ways: (i) We examine the global ranking performance of individual trees and ensembles, including Classification and Regression Trees (CART) and Bayesian Additive Regression Trees (BART), providing finite-sample oracle bounds, ranking consistency, and posterior contraction results. (ii) Inspired by the ranking perspective, we propose concordant divergence statistics $\mathcal{T}_0$ to evaluate symbolic feature mappings and establish their properties. Numerical experiments demonstrate the competitive performance of these statistics in symbolic feature selection tasks compared to existing methods.


Optimal Survival Trees: A Dynamic Programming Approach

arXiv.org Artificial Intelligence

Survival analysis studies and predicts the time of death, or other singular unrepeated events, based on historical data, while the true time of death for some instances is unknown. Survival trees enable the discovery of complex nonlinear relations in a compact human comprehensible model, by recursively splitting the population and predicting a distinct survival distribution in each leaf node. We use dynamic programming to provide the first survival tree method with optimality guarantees, enabling the assessment of the optimality gap of heuristics. We improve the scalability of our method through a special algorithm for computing trees up to depth two. The experiments show that our method's run time even outperforms some heuristics for realistic cases while obtaining similar out-of-sample performance with the state-of-the-art.


Tree Methods for Hierarchical Classification in Parallel

arXiv.org Artificial Intelligence

We propose methods that enable efficient hierarchical classification in parallel. Our methods transform a batch of classification scores and labels, corresponding to given nodes in a semantic tree, to scores and labels corresponding to all nodes in the ancestral paths going down the tree to every given node, relying only on tensor operations that execute efficiently on hardware accelerators. We implement our methods and test them on current hardware accelerators with a tree incorporating all English-language synsets in WordNet 3.0, spanning 117,659 classes in 20 levels of depth. We transform batches of scores and labels to their respective ancestral paths, incurring negligible computation and consuming only a fixed 0.04GB of memory over the footprint of data.


Harnessing Interpretable Machine Learning for Holistic Inverse Design of Origami

arXiv.org Artificial Intelligence

This work harnesses interpretable machine learning methods to address the challenging inverse design problem of origami-inspired systems. We show that a decision tree-random forest method is particularly suitable for fitting origami databases, containing both design features and functional performance, to generate human-understandable decision rules for the inverse design of functional origami. First, the tree method is unique because it can handle complex interactions between categorical features and continuous features, allowing it to compare different origami patterns for a design. Second, this interpretable method can tackle multi-objective problems for designing functional origami with multiple and multi-physical performance targets. Finally, the method can extend existing shape-fitting algorithms for origami to consider non-geometrical performance. The proposed framework enables holistic inverse design of origami, considering both shape and function, to build novel reconfigurable structures for various applications such as metamaterials, deployable structures, soft robots, biomedical devices, and many more.


Q-learning with online random forests

arXiv.org Machine Learning

$Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.


Variable Grouping Based Bayesian Additive Regression Tree

arXiv.org Machine Learning

Using ensemble methods for regression has been a large success in obtaining high-accuracy prediction. Examples are Bagging, Random forest, Boosting, BART (Bayesian additive regression tree), and their variants. In this paper, we propose a new perspective named variable grouping to enhance the predictive performance. The main idea is to seek for potential grouping of variables in such way that there is no nonlinear interaction term between variables of different groups. Given a sum-of-learner model, each learner will only be responsible for one group of variables, which would be more efficient in modeling nonlinear interactions. We propose a two-stage method named variable grouping based Bayesian additive regression tree (GBART) with a well-developed python package gbart available. The first stage is to search for potential interactions and an appropriate grouping of variables. The second stage is to build a final model based on the discovered groups. Experiments on synthetic and real data show that the proposed method can perform significantly better than classical approaches.


Conversational Sentiment Analysis

#artificialintelligence

I recently built a movie recommender that takes as input a user written passage about liked and/or disliked movies. At the onset of the project I figured that determining which movies users' liked and disliked would be simple. After all, using text to determine whether someone likes or dislike a movie doesn't seem too ambitious. With the variety of packages readily available for sentiment analysis in python, there had to be something available out of the box to do this job. As it turns out, using text to determine whether someone likes vs dislikes a movie, or any named entity, is deceivingly complex.


Several Tunable GMM Kernels

arXiv.org Machine Learning

While tree methods have been popular in practice, researchers and practitioners are also looking for simple algorithms which can reach similar accuracy of trees. In 2010, (Ping Li UAI'10) developed the method of "abc-robust-logitboost" and compared it with other supervised learning methods on datasets used by the deep learning literature. In this study, we propose a series of "tunable GMM kernels" which are simple and perform largely comparably to tree methods on the same datasets. Note that "abc-robust-logitboost" substantially improved the original "GDBT" in that (a) it developed a tree-split formula based on second-order information of the derivatives of the loss function; (b) it developed a new set of derivatives for multi-class classification formulation. In the prior study in 2017, the "generalized min-max" (GMM) kernel was shown to have good performance compared to the "radial-basis function" (RBF) kernel. However, as demonstrated in this paper, the original GMM kernel is often not as competitive as tree methods on the datasets used in the deep learning literature. Since the original GMM kernel has no parameters, we propose tunable GMM kernels by adding tuning parameters in various ways. Three basic (i.e., with only one parameter) GMM kernels are the "$e$GMM kernel", "$p$GMM kernel", and "$\gamma$GMM kernel", respectively. Extensive experiments show that they are able to produce good results for a large number of classification tasks. Furthermore, the basic kernels can be combined to boost the performance.


Making data science accessible - Machine Learning – Tree Methods

@machinelearnbot

Tree methods are commonly used in data science to understand patterns within data and to build predictive models. The term Tree Methods covers a variety of techniques with different levels of complexity but my aim is to highlight three I find useful. To set the problem up let's assume we have a census dataset containing age, education, employment status and so on. Given all this information we want to see if we can predict whether a person earns more than $50k per year. How can tree methods help us?